Welcome to the Biotek Package

The Biotek package of murraylab_tools is designed to make your life easier when analyzing Biotek time series data. Specifically, murraylab_tools.Biotek converts Biotek data to tidy format for easy analysis with Pandas and easy plotting with Seaborn. The package also contains a few convenience functions for some of the things you're likely to do with Biotek experiment time series data, currently including:

Background subtraction.
Endpoint summarization.
Growth curve summarization.
Measurement average over any contiguous window.
Spline fitting to measurement curves.
Smoothed derivative calculation.
OD normalization (or any other time-varying normalization)

murraylab_tools.Biotek also has a plotter class, BiotekCellPlotter, which greatly simplifies plotting individual wells' worth of data, and can put both fluorescence and OD measurement data on the same plot.

Data Tidying

Converting from Biotek output to a tidy, Pandas-readable format is simple. You start with a CSV file of data from Gen5. For example:



In [1]:

    
import murraylab_tools.biotek as mt_biotek
import os

# Note that the input file must be from excel output of a Biotek experiment,
# saved in CSV format. 
data_filename = os.path.join("biotek_examples", "RFP_GFP_traces.csv")
calibration = mt_biotek.calibration_data("10/50/14")
mt_biotek.tidy_biotek_data(data_filename, volume = 5, calibration_dict = calibration)

A tidified version of the data will be created with the same name and location as the original time trace file, with "_tidy" appended to the end of the name (pre-suffix).

Usually you will also want access to some meta-data on your experiment -- most importantly, what plasmids were put in each well, in what concentration. You can add this metadata in the form of a supplementary spreadsheet. The first column of the supplementary spreadsheet is assumed to contain a well number (e.g., "D4" or "A08"). Every other column contains some kind of metadata keyed to that well number, with a name of that metadata given in a header row. You can write that supplementary data file yourself in Excel or notepad.

(There is also some automatic conversion of fluorescence data to micromolar concentrations, using built-in calibration data. See the Automatic Unit Conversion section below for details.)

Here is an example of how to write a supplementary file programmatically. This experiment contains three replicated 2D titrations of a GFP plasmid on one axis (at concentrations of 0.25, 0.5, 1, and 2 nM) and an RFP plasmid on the other axis (at concentrations of 0.5, 1, 2, and 4 nM).



In [2]:

    
import csv, string

rfp_concs = [0.44, 0.88, 2.20, 3.97]
gfp_concs = [0.24, 0.49, 0.98, 2.01]
replicates = [1,2,3]

supplementary_filename = os.path.join("biotek_examples", "RFP_GFP_supplementary.csv")
with open(supplementary_filename, 'w') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(['Well', 'RFP Plasmid (nM)', 'GFP Plasmid (nM)', 'Replicate'])
    for row in range(4):
        for col in range(4):
            well_names = [string.ascii_uppercase[row] + "%d" % (col+6),
                          string.ascii_uppercase[row] + "%d" % (col+11),
                          string.ascii_uppercase[row+6] + "%d" % (col+1)]
            for rep in replicates:
                writer.writerow([well_names[rep-1], rfp_concs[row], gfp_concs[col], 
                                  rep])
    # Also include metadata for three negative control wells.
    writer.writerow(["E10", 0, 0, 1])
    writer.writerow(["E15", 0, 0, 2])
    writer.writerow(["K5", 0, 0, 3])

This supplementary file looks like:

To use that supplementary data:



In [3]:

    
mt_biotek.tidy_biotek_data(data_filename, supplementary_filename, volume = 5, calibration_dict = calibration)

Now we can easily read our Biotek data using Pandas:



In [4]:

    
import pandas as pd

tidy_filename = os.path.join("biotek_examples", "RFP_GFP_traces_tidy.csv")
df = pd.read_csv(tidy_filename)

Let's peek at the data in its tidy format:



In [5]:

    
df.head()









    Out[5]:







  
    
      
      Channel
      Gain
      Time (sec)
      Time (hr)
      Well
      Measurement
      Units
      Excitation
      Emission
      RFP Plasmid (nM)
      GFP Plasmid (nM)
      Replicate
      ChanStr
    
  
  
    
      0
      mRFP
      100
      0
      0.0
      A6
      16
      AFU
      580
      610
      0.44
      0.24
      1
      mRFP100580610
    
    
      1
      mRFP
      100
      0
      0.0
      A7
      13
      AFU
      580
      610
      0.44
      0.49
      1
      mRFP100580610
    
    
      2
      mRFP
      100
      0
      0.0
      A8
      7
      AFU
      580
      610
      0.44
      0.98
      1
      mRFP100580610
    
    
      3
      mRFP
      100
      0
      0.0
      A9
      12
      AFU
      580
      610
      0.44
      2.01
      1
      mRFP100580610
    
    
      4
      mRFP
      100
      0
      0.0
      A11
      10
      AFU
      580
      610
      0.44
      0.24
      2
      mRFP100580610

As you can see, each row of tidy data describes one well's read value for a single channel and gain at a single time, along with some metadata about that well.

Note that if you use data with OD measurements, you will get separate rows with those OD readings with channel "OD600" (or "OD700" if you use OD700), excitation 600 (or 700, if you use OD 700), emission -1, gain -1, and with units of absorbance. For example:



In [6]:

    
data_filename = os.path.join("biotek_examples", "cell_data_traces.csv")
supplementary_filename = os.path.join("biotek_examples", "cell_data_supplemental.csv")
with open(supplementary_filename, 'w') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(['Well', 'Row', 'Column'])
    for row in range(8):
        for col in range(12):
            row_str = string.ascii_uppercase[row]
            col_str = str(col+1)
            well    = row_str + col_str
            writer.writerow([well, row_str, col_str])

mt_biotek.tidy_biotek_data(data_filename, supplementary_filename, volume = 500, calibration_dict = calibration)
tidy_filename = os.path.join("biotek_examples", "cell_data_traces_tidy.csv")
cell_df = pd.read_csv(tidy_filename)
cell_df.head()









    Out[6]:







  
    
      
      Channel
      Gain
      Time (sec)
      Time (hr)
      Well
      Measurement
      Units
      Excitation
      Emission
      Row
      Column
      ChanStr
    
  
  
    
      0
      OD600
      -1
      145
      0.040278
      A1
      0.091
      absorbance
      600
      -1
      A
      1
      OD600-1600-1
    
    
      1
      OD600
      -1
      145
      0.040278
      A2
      0.107
      absorbance
      600
      -1
      A
      2
      OD600-1600-1
    
    
      2
      OD600
      -1
      145
      0.040278
      A3
      0.120
      absorbance
      600
      -1
      A
      3
      OD600-1600-1
    
    
      3
      OD600
      -1
      145
      0.040278
      A4
      0.127
      absorbance
      600
      -1
      A
      4
      OD600-1600-1
    
    
      4
      OD600
      -1
      145
      0.040278
      A5
      0.110
      absorbance
      600
      -1
      A
      5
      OD600-1600-1

Now we can use Seaborn to start plotting data quickly and (relatively) easily. In this case, let's look at time traces of all three replicates of each combination of plasmids, first for GFP and then for RFP:



In [7]:

    
import seaborn as sns
import matplotlib.pyplot as plt
rc={'lines.linewidth': 2, 'axes.labelsize': 14, 'axes.titlesize': 14}
sns.set(rc=rc)
%matplotlib inline

gfp_df = df[(df.Gain == 100) & (df.Channel == "deGFP")]
rfp_df = df[(df.Gain == 100) & (df.Channel == "mRFP")]
grid = sns.FacetGrid(gfp_df, col = "GFP Plasmid (nM)", row = "RFP Plasmid (nM)", 
                     hue = "Replicate", margin_titles = True)
grid.map(plt.plot, "Time (hr)", "Measurement")
grid.fig.tight_layout(w_pad=1)
plt.show()

plt.clf()
grid = sns.FacetGrid(rfp_df, col = "GFP Plasmid (nM)", row = "RFP Plasmid (nM)", 
                     hue = "Replicate", margin_titles = True)
grid.map(plt.plot, "Time (hr)", "Measurement")
grid.fig.tight_layout(w_pad=1)
plt.show()









    












    





<Figure size 432x288 with 0 Axes>

Automatic Unit Conversion

Note in the example above that the absolute concentrations of fluorescent proteins were automatically calculated in µM. The murraylab_tools.biotek package has a small database of known calibrations for our lab's Biotek's. It currently knows how to calibrate with deGFP, sfGFP, Citrine, mRFP, CFP, Venus, and Cherry, each at gains 61 and 100 (though note that the calibration data for Citrine, CFP, and Venus are quite out-of-date). The plate ID is pulled from the Biotek's output file. Fluorescent protein identities are pulled from channel names -- if you want your data to be automatically calibrated, your channel name will have to match a known channel exactly, though not case-sensitively.

Calibration data is used as a doubly-nested dictionary of the form

calibration_data[channel][biotek_id][gain] = AFU/µM.

This data structure is best built using the mt_biotek.calibration_data() function. By default, it pulls data from the most recent calibration for each channel. You can instead pull data from calibration from a particular date by providing a date argument, which must be a string of the form mm/dd/yy. You can see which calibrations have been done on which dates with mt_biotek.calibration_dates(), which currently gives:



In [8]:

    
mt_biotek.calibration_dates()









    Out[8]:





{'deGFP': array(['10/04/14', '06/23/18'], dtype=object),
 'Citrine': array(['10/04/14'], dtype=object),
 'mRFP': array(['10/04/14', '06/23/18'], dtype=object),
 'CFP': array(['10/04/14'], dtype=object),
 'Venus': array(['10/04/14'], dtype=object),
 'Cherry': array(['10/04/14'], dtype=object),
 'sfGFP': array(['06/23/18'], dtype=object)}

Calibration data is stored as a CSV file that is installed as part of the murraylab_tools package. It is on the github repository as "murraylab_tools/data/calibration_data.csv". You can also supply your own calibration file when you create your calibration data dictionary by providing a filename parameter:



In [9]:

    
calibration_file = os.path.join("biotek_examples", "example_calibration.csv")
aptamer_calibration = mt_biotek.calibration_data(filename = calibration_file)
aptamer_calibration









    Out[9]:





{'MGapt': {'date': '04/04/99',
  'b1': {100: 10},
  'b2': {100: 12},
  'b3': {100: 13},
  'b4': {100: 100}},
 'Spinach': {'date': '04/04/99',
  'b1': {100: 20},
  'b2': {100: 24},
  'b3': {100: 34},
  'b4': {100: 341}}}

If you don't use a channel containing the name of a calibrated substance, or you use a gain that hasn't been calibrated, then the measurement will be left in units of AFU.

Convenience Functions

Background Subtraction

Given a list of negative control wells, you can use the murraylab_tools.biotek package to subtract average background from your time series data:



In [10]:

    
negative_control_wells = ["E10", "E15", "K5"]
corrected_df = mt_biotek.background_subtract(df, negative_control_wells)
corrected_df.head()









    



/usr/lib/python3.8/site-packages/numpy/lib/function_base.py:393: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis)
/usr/lib/python3.8/site-packages/numpy/core/_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)






    Out[10]:







  
    
      
      index
      Channel
      Gain
      Time (sec)
      Time (hr)
      Well
      Measurement
      Units
      Excitation
      Emission
      RFP Plasmid (nM)
      GFP Plasmid (nM)
      Replicate
      ChanStr
    
  
  
    
      0
      0
      mRFP
      100
      0
      0.0
      A6
      10.666667
      AFU
      580
      610
      0.44
      0.24
      1
      mRFP100580610
    
    
      1
      51
      mRFP
      100
      360
      0.1
      A6
      2.333333
      AFU
      580
      610
      0.44
      0.24
      1
      mRFP100580610
    
    
      2
      102
      mRFP
      100
      720
      0.2
      A6
      12.000000
      AFU
      580
      610
      0.44
      0.24
      1
      mRFP100580610
    
    
      3
      153
      mRFP
      100
      1080
      0.3
      A6
      5.000000
      AFU
      580
      610
      0.44
      0.24
      1
      mRFP100580610
    
    
      4
      204
      mRFP
      100
      1440
      0.4
      A6
      11.000000
      AFU
      580
      610
      0.44
      0.24
      1
      mRFP100580610

background_subtract returns a dataframe of corrected data, which can be analyzed just like the dataframes returned in the example above.

Endpoint Averaging

If you know your fluorescence data plateaus near the end of the run, you may want to quickly find the endpoint fluorescence of each well, averaged over the last few time points. You can quickly do this with the endpoint_averages function of the murraylab_tools.biotek package.



In [11]:

    
endpoint_df = mt_biotek.endpoint_averages(corrected_df)
endpoint_df.head()









    Out[11]:







  
    
      
      Channel
      Gain
      Excitation
      Emission
      Well
      index
      Time (sec)
      Time (hr)
      Measurement
      Units
      RFP Plasmid (nM)
      GFP Plasmid (nM)
      Replicate
      ChanStr
    
  
  
    
      0
      deGFP
      61
      488
      535
      A11
      18466
      43359
      12.044167
      170.933333
      AFU
      0.44
      0.24
      2
      deGFP61488535
    
    
      1
      deGFP
      61
      488
      535
      A12
      18467
      43359
      12.044167
      443.033333
      AFU
      0.44
      0.49
      2
      deGFP61488535
    
    
      2
      deGFP
      61
      488
      535
      A13
      18468
      43359
      12.044167
      85.733333
      AFU
      0.44
      0.98
      2
      deGFP61488535
    
    
      3
      deGFP
      61
      488
      535
      A14
      18469
      43359
      12.044167
      -9.466667
      AFU
      0.44
      2.01
      2
      deGFP61488535
    
    
      4
      deGFP
      61
      488
      535
      A6
      18462
      43359
      12.044167
      132.433333
      AFU
      0.44
      0.24
      1
      deGFP61488535

Note that every numeric field has been averaged over the last five data points, so don't take the time column too seriously here.

Now we can quickly plot endpoint averages for the two color channels:



In [12]:

    
green_endpoints = endpoint_df[(endpoint_df.Channel == "deGFP") & (endpoint_df.Gain == 61)]
sns.lmplot(data = green_endpoints, x = 'GFP Plasmid (nM)', 
              y = 'Measurement', hue = 'RFP Plasmid (nM)', fit_reg = False,
              x_jitter = 0.05)









    Out[12]:





<seaborn.axisgrid.FacetGrid at 0x7ff87cc0b880>



In [13]:

    
red_endpoints = endpoint_df[(endpoint_df.Channel == "mRFP") & (endpoint_df.Gain == 100)]
sns.lmplot(data = red_endpoints, x = 'RFP Plasmid (nM)', 
              y = 'Measurement', hue = 'GFP Plasmid (nM)', fit_reg = False,
              x_jitter = 0.05)









    Out[13]:





<seaborn.axisgrid.FacetGrid at 0x7ff87cc45e50>

Averaging in other windows

You can also take time averages in other windows using the window_averages function. This works much like endpoint_averages, except that instead of averaging over a slice at the end of the experiment, it will average over a window whose start and end you specify. You can specify start and end times using seconds, hours, or a timepoint index.



In [14]:

    
# Averaging over a window near the start of the experiment, 
# indexed by hours
endpoint_df = mt_biotek.window_averages(corrected_df, 2, 4, "hours")
endpoint_df.head()









    Out[14]:







  
    
      
      Channel
      Gain
      Excitation
      Emission
      Well
      index
      Time (sec)
      Time (hr)
      Measurement
      Units
      RFP Plasmid (nM)
      GFP Plasmid (nM)
      Replicate
      ChanStr
    
  
  
    
      0
      deGFP
      61
      488
      535
      A11
      14335
      14199
      3.944167
      74.666667
      AFU
      0.44
      0.24
      2
      deGFP61488535
    
    
      1
      deGFP
      61
      488
      535
      A12
      14336
      14199
      3.944167
      184.516667
      AFU
      0.44
      0.49
      2
      deGFP61488535
    
    
      2
      deGFP
      61
      488
      535
      A13
      14337
      14199
      3.944167
      54.966667
      AFU
      0.44
      0.98
      2
      deGFP61488535
    
    
      3
      deGFP
      61
      488
      535
      A14
      14338
      14199
      3.944167
      -2.533333
      AFU
      0.44
      2.01
      2
      deGFP61488535
    
    
      4
      deGFP
      61
      488
      535
      A6
      14331
      14199
      3.944167
      61.016667
      AFU
      0.44
      0.24
      1
      deGFP61488535



In [15]:

    
# indexed by seconds
endpoint_df = mt_biotek.window_averages(corrected_df, 7200, 14400, 
                                        "seconds")
endpoint_df.head()









    Out[15]:







  
    
      
      Channel
      Gain
      Excitation
      Emission
      Well
      index
      Time (sec)
      Time (hr)
      Measurement
      Units
      RFP Plasmid (nM)
      GFP Plasmid (nM)
      Replicate
      ChanStr
    
  
  
    
      0
      deGFP
      61
      488
      535
      A11
      14335
      14199
      3.944167
      74.666667
      AFU
      0.44
      0.24
      2
      deGFP61488535
    
    
      1
      deGFP
      61
      488
      535
      A12
      14336
      14199
      3.944167
      184.516667
      AFU
      0.44
      0.49
      2
      deGFP61488535
    
    
      2
      deGFP
      61
      488
      535
      A13
      14337
      14199
      3.944167
      54.966667
      AFU
      0.44
      0.98
      2
      deGFP61488535
    
    
      3
      deGFP
      61
      488
      535
      A14
      14338
      14199
      3.944167
      -2.533333
      AFU
      0.44
      2.01
      2
      deGFP61488535
    
    
      4
      deGFP
      61
      488
      535
      A6
      14331
      14199
      3.944167
      61.016667
      AFU
      0.44
      0.24
      1
      deGFP61488535



In [16]:

    
# indexed by index
endpoint_df = mt_biotek.window_averages(corrected_df, 33, 53, "index")
endpoint_df.head()









    Out[16]:







  
    
      
      Channel
      Gain
      Excitation
      Emission
      Well
      index
      Time (sec)
      Time (hr)
      Measurement
      Units
      RFP Plasmid (nM)
      GFP Plasmid (nM)
      Replicate
      ChanStr
    
  
  
    
      0
      deGFP
      61
      488
      535
      A11
      15049
      19239
      5.344167
      101.650794
      AFU
      0.44
      0.24
      2
      deGFP61488535
    
    
      1
      deGFP
      61
      488
      535
      A12
      15050
      19239
      5.344167
      260.126984
      AFU
      0.44
      0.49
      2
      deGFP61488535
    
    
      2
      deGFP
      61
      488
      535
      A13
      15051
      19239
      5.344167
      72.317460
      AFU
      0.44
      0.98
      2
      deGFP61488535
    
    
      3
      deGFP
      61
      488
      535
      A14
      15052
      19239
      5.344167
      -1.920635
      AFU
      0.44
      2.01
      2
      deGFP61488535
    
    
      4
      deGFP
      61
      488
      535
      A6
      15045
      19239
      5.344167
      85.222222
      AFU
      0.44
      0.24
      1
      deGFP61488535

Growth Curve Summarization

To extract features of an OD growth curve, you can use the summarize_growth function. This will do two things for each well in your DataFrame:

It will fit the well's growth curve to a logistic-plus-floor model (logistic growth function plus a constant) and report back the parameters of the best fit.
If an optional growth_threshold parameter is set, it will find the time when growth exceeds a given fraction of maximum growth. (Not currently implemented; coming soon!)

Let's look at some growth data from the cell data that we loaded at the beginning of this notebook:



In [17]:

    
od_df = cell_df[cell_df.Channel == "OD600"]
sns.lineplot(data = od_df, x = "Time (hr)", y = "Measurement", hue = "Row", units = "Column",
             estimator = None)









    Out[17]:





<matplotlib.axes._subplots.AxesSubplot at 0x7ff87e302730>

To get estimates for growth rate and other growth characteristics for this data, we do the following (note that this takes a second or two for each well, so this could take a while):



In [18]:

    
growth_df = mt_biotek.summarize_growth(od_df, channel = "OD600", fixed_init = 0.1)









    



/usr/lib/python3.8/site-packages/scipy/optimize/minpack.py:807: OptimizeWarning: Covariance of the parameters could not be estimated
  warnings.warn('Covariance of the parameters could not be estimated',

To summarize, say, the growth rates of each well:



In [19]:

    
sns.swarmplot(data = growth_df, x = "Row", y = "Rate", hue = "Column")









    Out[19]:





<matplotlib.axes._subplots.AxesSubplot at 0x7ff87c6fbfd0>

If you want to see what one or more fits actually looks like, you can plug the estimated parameters back into the model function for logistic-growth-plus-floor (mt_biotek.logistic_growth).



In [20]:

    
row_df = od_df[od_df.Row == "B"]
for well in row_df.Well.unique():
    summary_df = growth_df[growth_df.Well == well]
    rate  = summary_df.Rate.iloc[0]
    init  = summary_df.Init.iloc[0]
    cap   = summary_df.Cap.iloc[0]
    floor = summary_df.Floor.iloc[0]
    
    well_df = row_df[row_df.Well == well]
    times = well_df["Time (hr)"]
    fit_values = mt_biotek.logistic_growth(times, rate, cap, floor, init)
    
    plt.plot(times, fit_values, lw = 1, color = "black")
    plt.plot(times, well_df.Measurement, lw = 0.5, color = "red")
plt.xlabel("Time (hr)")
plt.ylabel("OD")
plt.title("Logistic fits to row B")









    Out[20]:





Text(0.5, 1.0, 'Logistic fits to row B')

Spline Fit

If you want to work with a denoised version of your data, you can fit a 3rd order spline to that data using the spline_fit function. spline_fit returns a dataframe identical to the original, except that it contains an extra column "uM spline fit", which contains the fit.



In [21]:

    
fit_df = mt_biotek.spline_fit(df, smoothing_factor = 1e4)
fit_df.head()









    



/home/sclamons/Downloads/murraylab_tools_local/murraylab_tools/biotek/biotek.py:836: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group["spline fit"] = spline(group["Time (sec)"])






    Out[21]:







  
    
      
      Channel
      Gain
      Time (sec)
      Time (hr)
      Well
      Measurement
      Units
      Excitation
      Emission
      RFP Plasmid (nM)
      GFP Plasmid (nM)
      Replicate
      ChanStr
      spline fit
    
  
  
    
      12346
      deGFP
      61
      159
      0.044167
      A11
      0
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
      0.275834
    
    
      12397
      deGFP
      61
      519
      0.144167
      A11
      5
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
      3.140807
    
    
      12448
      deGFP
      61
      879
      0.244167
      A11
      0
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
      6.000001
    
    
      12499
      deGFP
      61
      1239
      0.344167
      A11
      2
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
      8.852971
    
    
      12550
      deGFP
      61
      1599
      0.444167
      A11
      15
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
      11.699274



In [22]:

    
example_fit_df = fit_df[(fit_df.Gain == 100) & (fit_df.Channel == "mRFP") \
                        & (fit_df["GFP Plasmid (nM)"] == 2.01) & (fit_df["RFP Plasmid (nM)"] == 3.97)\
                        & (fit_df.Replicate == 1)]
plt.plot(example_fit_df["Time (hr)"], example_fit_df["Measurement"], color = "blue", label = "Raw Data")
plt.plot(example_fit_df["Time (hr)"], example_fit_df["spline fit"], color = "red", label="Spline Fit")
plt.legend(loc = 2)









    Out[22]:





<matplotlib.legend.Legend at 0x7ff8963176d0>

Note the magic smoothing_factor parameter in the mt_biotek.spline_fit function. This is the parameter s in scipy's UnivariateSpline function. Larger values of s give smoother, more global results. Smaller values of s give more locally-fit results. There's a default value for this, but it's not very good for a lot of data, so you'll often have to find a good one by trial-and-error.

Here are a few examples of spline fits with different smoothing factors. Note that the spline fit with s=1 is perfectly fit to the data.



In [23]:

    
plt.plot(example_fit_df["Time (hr)"], example_fit_df["Measurement"], color = "blue", label = "Raw Data")

for s in [1, 5e3, 1e4]:
    fit_df = mt_biotek.spline_fit(df, smoothing_factor = s)
    example_fit_df = fit_df[(fit_df.Gain == 100) & (fit_df.Channel == "mRFP") \
                            & (fit_df["GFP Plasmid (nM)"] == 2.01) & (fit_df["RFP Plasmid (nM)"] == 3.97)\
                            & (fit_df.Replicate == 1)]
    plt.plot(example_fit_df["Time (hr)"], example_fit_df["spline fit"], label=f"s = {s}")
plt.legend(loc = 2)









    



/home/sclamons/Downloads/murraylab_tools_local/murraylab_tools/biotek/biotek.py:836: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group["spline fit"] = spline(group["Time (sec)"])
/usr/lib/python3.8/site-packages/scipy/interpolate/fitpack2.py:232: UserWarning: 
The maximal number of iterations maxit (set to 20 by the program)
allowed for finding a smoothing spline with fp=s has been reached: s
too small.
There is an approximation returned but the corresponding weighted sum
of squared residuals does not satisfy the condition abs(fp-s)/s < tol.
  warnings.warn(message)
/usr/lib/python3.8/site-packages/scipy/interpolate/fitpack2.py:232: UserWarning: 
A theoretically impossible result was found during the iteration
process for finding a smoothing spline with fp = s: s too small.
There is an approximation returned but the corresponding weighted sum
of squared residuals does not satisfy the condition abs(fp-s)/s < tol.
  warnings.warn(message)
/home/sclamons/Downloads/murraylab_tools_local/murraylab_tools/biotek/biotek.py:836: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group["spline fit"] = spline(group["Time (sec)"])
/home/sclamons/Downloads/murraylab_tools_local/murraylab_tools/biotek/biotek.py:836: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group["spline fit"] = spline(group["Time (sec)"])






    Out[23]:





<matplotlib.legend.Legend at 0x7ff8962d4c10>

Let's look at some splined fits for the GFP data from above:



In [24]:

    
gfp_df = df[(df.Gain == 100) & (df.Channel == "deGFP")]
splined_gfp_df = mt_biotek.spline_fit(gfp_df, smoothing_factor = 1e15)
grid = sns.FacetGrid(splined_gfp_df, col = "GFP Plasmid (nM)", 
                     row = "RFP Plasmid (nM)", 
                     hue = "Replicate", margin_titles = True)
grid.map(plt.plot, "Time (hr)", "spline fit")
grid.map(plt.plot, "Time (hr)", "Measurement")
grid.fig.tight_layout(w_pad=1)
plt.show()









    



/home/sclamons/Downloads/murraylab_tools_local/murraylab_tools/biotek/biotek.py:836: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  group["spline fit"] = spline(group["Time (sec)"])



In [25]:

    
rfp_df = df[(df.Gain == 100) & (df.Channel == "mRFP")]
splined_rfp_df = mt_biotek.spline_fit(rfp_df, smoothing_factor = 1e4)
grid = sns.FacetGrid(splined_rfp_df, col = "GFP Plasmid (nM)", 
                     row = "RFP Plasmid (nM)", 
                     hue = "Replicate", margin_titles = True)
grid.map(plt.plot, "Time (hr)", "Measurement")
grid.map(plt.plot, "Time (hr)", "spline fit")
grid.fig.tight_layout(w_pad=1)
plt.show()

They're pretty good fits, though note the downward-turning tails on the ends of most of them.

Moving average fits

Another simple way to smooth out data is to take a moving average. This can be made with the moving_average_fit function. Change the window size (and thus the level of smoothing) with the window_size parameter. The units of window_size are set by the units parameter, which can be seconds, hours, or index (a number of frames).

NOTE: Unlike spline_fit (or most other convenience functions), the DataFrame returned by moving_average_fit has the Measurement (or other) column replaced by the smoothed fit.

Make sure to use a window size greater than one frame, or you'll get back a lot of NaNs.

As with spline_fit, you can fit against any column, but usually you want to fit "Measurement".



In [26]:

    
window_fit_df = mt_biotek.moving_average_fit(df, column = "Measurement", window_size = 600, units = "seconds")
window_fit_df.head()









    Out[26]:







  
    
      
      Channel
      Gain
      Time (sec)
      Time (hr)
      Well
      Measurement
      Units
      Excitation
      Emission
      RFP Plasmid (nM)
      GFP Plasmid (nM)
      Replicate
      ChanStr
    
  
  
    
      12346
      deGFP
      61
      159
      0.044167
      A11
      1.666667
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
    
    
      12397
      deGFP
      61
      519
      0.144167
      A11
      2.333333
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
    
    
      12448
      deGFP
      61
      879
      0.244167
      A11
      5.666667
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
    
    
      12499
      deGFP
      61
      1239
      0.344167
      A11
      13.000000
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535
    
    
      12550
      deGFP
      61
      1599
      0.444167
      A11
      21.333333
      AFU
      488
      535
      0.44
      0.24
      2
      deGFP61488535

Here are a few examples with different window sizes:



In [27]:

    
plt.plot(example_fit_df["Time (hr)"], example_fit_df["Measurement"], color = "blue", label = "Raw Data")

for window_size in [1e3, 3e3, 1e4]:
    fit_df = mt_biotek.moving_average_fit(df, window_size = window_size, units = "seconds")
    example_fit_df = fit_df[(fit_df.Gain == 100) & (fit_df.Channel == "mRFP") \
                            & (fit_df["GFP Plasmid (nM)"] == 2.01) & (fit_df["RFP Plasmid (nM)"] == 3.97)\
                            & (fit_df.Replicate == 1)]
    plt.plot(example_fit_df["Time (hr)"], example_fit_df["Measurement"], label=f"{window_size:5.0f} second window")
plt.legend(loc = 2)









    Out[27]:





<matplotlib.legend.Legend at 0x7ff87d693310>

Derivative calculation

You can calculate the rate of production of a molecule using the smoothed_derivatives function. This function first fits the function using the moving_window_average function above. It then numerically calculates the derivative of that spline, in units of uM/sec (or U/sec, whatever the actual unit in the "uM" column is).

smoothed_derivatives takes the same parameters as moving_window_average, and passes those through to the latter function.

Here is the gain 100 GFP data from above plotted again as a time derivative:



In [28]:

    
gfp_df = df[(df.Gain == 100) & (df.Channel == "deGFP")]
deriv_df = mt_biotek.smoothed_derivatives(gfp_df)









    



/usr/lib/python3.8/site-packages/pandas/core/generic.py:5302: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value



In [29]:

    
grid = sns.FacetGrid(deriv_df, col = "GFP Plasmid (nM)", row = "RFP Plasmid (nM)", 
                     hue = "Replicate", margin_titles = True)
grid.map(plt.plot, "Time (hr)", "Measurement")
grid.fig.tight_layout(w_pad=1)
plt.show()

Manual Unit Conversion

If you want to manually convert an AFU readout to absolute concentration, you can do so with the raw_to_uM function of murraylab_tools.biotek. To convert, you will need the name of the fluorescent protein (one of "GFP", "Citrine", "RFP", "CFP", "Venus", or "Cherry"), the biotek number (one of "b1", "b2", or "b3"), the gain (61 or 100), and the reaction volume (any number, in µL).



In [30]:

    
AFU = 1000
channel = "deGFP"
biotek = "b1"
gain = 61
volume = 5
calibration_dict = mt_biotek.calibration_data()
mt_biotek.raw_to_uM(calibration_dict, AFU, channel, biotek, gain, volume)









    Out[30]:





0.9792401096748922

OD (or other) Normalization

Remember that cell data we briefly imported at the beginning of this tutorial? There was GFP flourescence data in there alongside the OD data. We often want to normalize fluorescence data by OD, so there's a (slow) function for that:



In [31]:

    
normalized_df = mt_biotek.normalize(cell_df)

Let's look at a couple of slices of that data before and after normalization:



In [32]:

    
# sns.tsplot(cell_df[(cell_df.Row == "A") & (cell_df.Channel == "GFP") & (cell_df.Gain == 61)], 
#            time = "Time (hr)", value = "uM", unit = "Column", subject = "Well", 
#            err_style = "unit_traces")
for row in ["A", "B", "C", "D", "E", "F"]:
    plot_df = cell_df[(cell_df.Row == row) & (cell_df.Column == 1) & \
                      (cell_df.Channel == "deGFP") & (cell_df.Gain == 61)]
    plt.plot(plot_df["Time (hr)"], plot_df["Measurement"], label = row + "1")
plt.xlabel("Time (hr)")
plt.ylabel("uM GFP")
plt.legend()
plt.title("GFP Before Normalization")
plt.show()

for row in ["A", "B", "C", "D", "E", "F"]:
    plot_df = cell_df[(cell_df.Row == row) & (cell_df.Column == 1) & \
                      (cell_df.Channel == "OD600")]
    plt.plot(plot_df["Time (hr)"], plot_df["Measurement"], label = row + "1")
plt.xlabel("Time (hr)")
plt.ylabel("OD600")
plt.legend()
plt.title("OD")
plt.show()

for row in ["A", "B", "C", "D", "E", "F"]:
    plot_df = normalized_df[(normalized_df.Row == row) & (normalized_df.Column == 1) & \
                      (normalized_df.Channel == "deGFP") & (normalized_df.Gain == 61)]
    plt.plot(plot_df["Time (hr)"], plot_df["Measurement"], label = row + "1")
plt.xlabel("Time (hr)")
plt.ylabel(normalized_df.Units.unique()[0])
plt.legend()
plt.title("Normalized GFP")
plt.show()

By default, murraylab_tools.biotek will normalize by OD600. Sometimes you may wish to normalize by another channel -- perhaps OD700. Do do this, use the optional norm_channel argument:

normalized_df = mt_biotek.normalize(cell_df, norm_channel = "OD700")

(That particular code will crash if you try to run it, because there isn't any OD700 data in the DataFrame, but you get the idea.)

If, for some reason, you want to normalize by fluorescence data, you can do that too, but you'll need to specify the gain you want to use, like so:



In [33]:

    
crazy_normalized_df = mt_biotek.normalize(cell_df, norm_channel = "deGFP", norm_channel_gain = 61)

Plotting with OD

The Biotek package includes a convenience function for plotting data with an OD curve in the background, one well at a time. For example:



In [40]:

    
plotter = mt_biotek.BiotekCellPlotter(cell_df, "deGFP", 61)
plotter.add_well("B1", "red", "Well B1")
plotter.add_condition((cell_df.Well == "D1") | (cell_df.Well == "C1"),
                      color = "blue", label = "C1 or D1")
g = plotter.plot(title = "Some cell data")









    





<Figure size 432x288 with 0 Axes>

You can also save the figure, in addition to displaying it, by adding the flag

filename = <the name of your output file>

to the plotter.plot() call. If you don't want to display the figure at all, you can call plotter.plot() with the flag show = False.

If you want two separate figures for your OD and fluorescence data, you can split it up using the split_plots flag:



In [39]:

    
plotter = mt_biotek.BiotekCellPlotter(cell_df, "deGFP", 61)
plotter.add_well("B1", "red", "Well B1")
plotter.add_condition((cell_df.Well == "D1") | (cell_df.Well == "C1"),
                      color = "blue", label = "C1 or D1")
g = plotter.plot(title = "Some cell data", split_plots = True);









    





<Figure size 432x288 with 0 Axes>






    












    





<Figure size 432x288 with 0 Axes>

BiotekCellPlotter has a host of other options. Here are the parameters it can take:

title: String that will go at the top of the figure. Default "".
column: Column containing the data you want plotted. Defaults to "Measurement", which is standard for fluorescence and OD raw data. You'll need to use other columns if you're plotting a DataFrame of derivatives or fits.
split_plots: Boolean controlling how data will be presented. If True, will produce two plots -- one with OD data, one with normalized fluorescence data. If False (default), will show both sets of data on the same plot.
filename: If set, will save this figure in addition to showing it, to a file with the name set by filename.
show: Boolean that sets whether the plot will actually be shown. Default True; set to False if you want to save without viewing it.
figsize: 2-tuple of the size of the figure. Is passed directly to figure creation.
linewidth: Default 1. Passed to plot commands to set line width.
ymax: Optional flag to set a manual maximum y value on the fluorescence axis.
od_ymax: Optional flag to set a manual maximum y value on the OD axis.
show_legend: Boolean that determines whether the legend is displayed.



In [ ]:

	Channel	Gain	Well	Measurement	Units	Excitation	Emission	RFP Plasmid (nM)	GFP Plasmid (nM)	Replicate	ChanStr
0	mRFP	100	A6	16	AFU	580	610	0.44	0.24	1	mRFP100580610
1	mRFP	100	A7	13	AFU	580	610	0.44	0.49	1	mRFP100580610
2	mRFP	100	A8	7	AFU	580	610	0.44	0.98	1	mRFP100580610
3	mRFP	100	A9	12	AFU	580	610	0.44	2.01	1	mRFP100580610
4	mRFP	100	A11	10	AFU	580	610	0.44	0.24	2	mRFP100580610

	Channel	Gain	Time (sec)	Time (hr)	Well	Measurement	Units	Excitation	Emission	Row	Column	ChanStr
0	OD600	-1	145	0.040278	A1	0.091	absorbance	600	-1	A	1	OD600-1600-1
1	OD600	-1	145	0.040278	A2	0.107	absorbance	600	-1	A	2	OD600-1600-1
2	OD600	-1	145	0.040278	A3	0.120	absorbance	600	-1	A	3	OD600-1600-1
3	OD600	-1	145	0.040278	A4	0.127	absorbance	600	-1	A	4	OD600-1600-1
4	OD600	-1	145	0.040278	A5	0.110	absorbance	600	-1	A	5	OD600-1600-1

	index	Channel	Gain	Time (sec)	Time (hr)	Well	Measurement	Units	Excitation	Emission	RFP Plasmid (nM)	GFP Plasmid (nM)	Replicate	ChanStr
0	0	mRFP	100	0	0.0	A6	10.666667	AFU	580	610	0.44	0.24	1	mRFP100580610
1	51	mRFP	100	360	0.1	A6	2.333333	AFU	580	610	0.44	0.24	1	mRFP100580610
2	102	mRFP	100	720	0.2	A6	12.000000	AFU	580	610	0.44	0.24	1	mRFP100580610
3	153	mRFP	100	1080	0.3	A6	5.000000	AFU	580	610	0.44	0.24	1	mRFP100580610
4	204	mRFP	100	1440	0.4	A6	11.000000	AFU	580	610	0.44	0.24	1	mRFP100580610

	Channel	Gain	Excitation	Emission	Well	index	Time (sec)	Time (hr)	Measurement	Units	RFP Plasmid (nM)	GFP Plasmid (nM)	Replicate	ChanStr
0	deGFP	61	488	535	A11	18466	43359	12.044167	170.933333	AFU	0.44	0.24	2	deGFP61488535
1	deGFP	61	488	535	A12	18467	43359	12.044167	443.033333	AFU	0.44	0.49	2	deGFP61488535
2	deGFP	61	488	535	A13	18468	43359	12.044167	85.733333	AFU	0.44	0.98	2	deGFP61488535
3	deGFP	61	488	535	A14	18469	43359	12.044167	-9.466667	AFU	0.44	2.01	2	deGFP61488535
4	deGFP	61	488	535	A6	18462	43359	12.044167	132.433333	AFU	0.44	0.24	1	deGFP61488535

	Channel	Gain	Excitation	Emission	Well	index	Time (sec)	Time (hr)	Measurement	Units	RFP Plasmid (nM)	GFP Plasmid (nM)	Replicate	ChanStr
0	deGFP	61	488	535	A11	14335	14199	3.944167	74.666667	AFU	0.44	0.24	2	deGFP61488535
1	deGFP	61	488	535	A12	14336	14199	3.944167	184.516667	AFU	0.44	0.49	2	deGFP61488535
2	deGFP	61	488	535	A13	14337	14199	3.944167	54.966667	AFU	0.44	0.98	2	deGFP61488535
3	deGFP	61	488	535	A14	14338	14199	3.944167	-2.533333	AFU	0.44	2.01	2	deGFP61488535
4	deGFP	61	488	535	A6	14331	14199	3.944167	61.016667	AFU	0.44	0.24	1	deGFP61488535

	Channel	Gain	Excitation	Emission	Well	index	Time (sec)	Time (hr)	Measurement	Units	RFP Plasmid (nM)	GFP Plasmid (nM)	Replicate	ChanStr
0	deGFP	61	488	535	A11	15049	19239	5.344167	101.650794	AFU	0.44	0.24	2	deGFP61488535
1	deGFP	61	488	535	A12	15050	19239	5.344167	260.126984	AFU	0.44	0.49	2	deGFP61488535
2	deGFP	61	488	535	A13	15051	19239	5.344167	72.317460	AFU	0.44	0.98	2	deGFP61488535
3	deGFP	61	488	535	A14	15052	19239	5.344167	-1.920635	AFU	0.44	2.01	2	deGFP61488535
4	deGFP	61	488	535	A6	15045	19239	5.344167	85.222222	AFU	0.44	0.24	1	deGFP61488535

	Channel	Gain	Time (sec)	Time (hr)	Well	Measurement	Units	Excitation	Emission	RFP Plasmid (nM)	GFP Plasmid (nM)	Replicate	ChanStr	spline fit
12346	deGFP	61	159	0.044167	A11	0	AFU	488	535	0.44	0.24	2	deGFP61488535	0.275834
12397	deGFP	61	519	0.144167	A11	5	AFU	488	535	0.44	0.24	2	deGFP61488535	3.140807
12448	deGFP	61	879	0.244167	A11	0	AFU	488	535	0.44	0.24	2	deGFP61488535	6.000001
12499	deGFP	61	1239	0.344167	A11	2	AFU	488	535	0.44	0.24	2	deGFP61488535	8.852971
12550	deGFP	61	1599	0.444167	A11	15	AFU	488	535	0.44	0.24	2	deGFP61488535	11.699274